Self-organization and missing values in SOM and GTM

نویسندگان

  • Tommi Vatanen
  • M. Osmala
  • Tapani Raiko
  • Krista Lagus
  • Marko Sysi-Aho
  • Matej Oresic
  • Timo Honkela
  • Harri Lähdesmäki
چکیده

In this paper, we study fundamental properties of the Self-Organizing Map (SOM) and the Generative Topographic Mapping (GTM), ramifications of the initialization of the algorithms and properties of the algorithms in the presence of missing data. We show that the commonly used principal component analysis (PCA) initialization of the GTM does not guarantee good learning results with high-dimensional data. Initializing the GTM with the SOM is shown to yield improvements in self-organization with three high-dimensional data sets: commonly used MNIST and ISOLET data sets and epigenomic ENCODE data set. We also propose a revision of handling missing data to the batch SOM algorithm called the Imputation SOM and show that the new algorithm is more robust in the presence of missing data. We benchmark the performance of the topographic mappings in the missing value imputation task and conclude that there are better methods for this particular task. Finally, we announce a revised version of the SOM Toolbox for Matlab with added GTM functionality. & 2014 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Missing data imputation through Generative Topographic Mapping as a mixture of t - distributions : Theoretical developments

The Generative Topographic Mapping (GTM) was originally conceived as a probabilistic alternative to the well-known, neural network-inspired, Self-Organizing Map (SOM). The GTM can also be interpreted as a constrained mixture of distributions model. In recent years, much attention has been directed towards Student t-distributions as an alternative to Gaussians in mixture models due to their robu...

متن کامل

Experimental Analysis of GTM

Not linear methods for statistical data analysis have become more and more popular thanks to the rapid development of computers. The fields in which they are applied to are as various as the methods them self. Generative topographic mapping (GTM) has been developed by [Bishop et al. 1997] as a principal alternative to the self-organizing map (SOM) algorithm [Kohonen 1982] in which a set of unla...

متن کامل

S-Map: A Network with a Simple Self-Organization Algorithm for Generative Topographic Mappings

The S-Map is a network with a simple learning algorithm that combines the self-organization capability of the Self-Organizing Map (SOM) and the probabilistic interpretability of the Generative Topographic Mapping (GTM). The simulations suggest that the SMap algorithm has a stronger tendency to self-organize from random initial configuration than the GTM. The S-Map algorithm can be further simpl...

متن کامل

Developments of the generative topographic mapping

The Generative Topographic Mapping (GTM) model was introduced by 7) as a probabilistic re-formulation of the self-organizing map (SOM). It offers a number of advantages compared with the standard SOM, and has already been used in a variety of applications. In this paper we report on several extensions of the GTM, including an incremental version of the EM algorithm for estimating the model para...

متن کامل

Missing data imputation through GTM as a mixture of t-distributions

The Generative Topographic Mapping (GTM) was originally conceived as a probabilistic alternative to the well-known, neural network-inspired, Self-Organizing Maps. The GTM can also be interpreted as a constrained mixture of distribution models. In recent years, much attention has been directed towards Student t-distributions as an alternative to Gaussians in mixture models due to their robustnes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Neurocomputing

دوره 147  شماره 

صفحات  -

تاریخ انتشار 2015